Video-attribute-information output apparatus, video digest forming apparatus, computer program product, and video-attribute-information output method

ABSTRACT

An attribute-information-area extracting unit extracts an attribute information area in which attribute information is displayed when the attribute information area does not change between certain frames of adjacent scenes obtained by dividing a video content by a scene dividing unit. A character-areas extracting unit extracts character areas in which video attribute information in individual characters that is metadata of the video content of the attribute information area are present, and a character-area-meaning assigning unit assigns meanings to the character areas. A character-area reading unit reads the video attribute information, from the character areas to which meanings are assigned, thereby outputting the video attribute information.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is based upon and claims the benefit of priority fromthe prior Japanese Patent Application No. 2007-008909, filed on Jan. 18,2007; the entire contents of which are incorporated herein by reference.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to a video-attribute-information outputapparatus, a video digest forming apparatus, a computer program product,and a video-attribute-information output method.

2. Description of the Related Art

With the recent advancement of the information infrastructure ofmulti-channel broadcasting, a large amount of video contents has beendistributed. On the video recorder side, because the widespread use ofhard disk recorders and personal computers equipped with a tuner enablesvideo contents to be recorded as digital data, the contents can beviewed in various manners.

The manners of viewing on a video recorder include a video digesttechnique, with which a long video program can be shortened into digestform. For instance, JP-A 2005-109566 (KOKAI) describes a method ofmaking a digest of a sports commentary program, with which scenes areextracted from a metadata-added video picture in accordance with theuser's preference, thereby obtaining a digest. The metadata may be inputas events of individual players' plays in the game, together with timeinformation thereof, and the metadata needs to be manually input whilereferring to the video picture.

JP-A 2000-132563 (KOKAI) describes a method of assisting in metadatainputting. In particular, an image region for displaying informationsuch as the inning and score (score information area) in a baseball gamebroadcast is designated and entered as a key image. When it isdetermined that a rate of change in the key image exceeds a certainstandard level upon an event such as the end of inning, the image ofthis moment is displayed.

According to the method of JP-A 2005-109566 (KOKAI), however, while theuse of metadata necessary for digest formation has improved the accuracyof the summary, a problem of labor and costs arises because all themetadata has to be manually input.

The technique of JP-A 2000-132563 (KOKAI) may save some of the labor ofinputting metadata for a baseball game broadcast.

However, with the technique of JP-A 2000-132563 (KOKAI), the imageregion for displaying inning and score information still needs to bemanually designated, and images detected as events also have to bemanually read in. Thus, the problem of labor for the manual designationof an image region for each program and labor and costs for the manualinput of detected innings and scores is yet to be solved.

SUMMARY OF THE INVENTION

According to one aspect of the present invention, avideo-attribute-information output apparatus includes a scene dividingunit that detects a change in scenes where there is little similaritybetween frames of a video content, and divides the video content into aplurality of scenes; an attribute-information-area extracting unit thatextracts an attribute information area in which attribute information isdisplayed and which has no change between specific frames of adjacentscenes that are obtained by dividing by the scene dividing unit; acharacter-area extracting unit that extracts character areas in whichvideo attribute information in individual characters that is metadata ofthe video content is present, from the attribute information areaextracted by the attribute-information-area extracting unit; acharacter-area-meaning assigning unit that assigns meanings to thecharacter areas extracted by the character-area extracting unit, byreferring to rule information that specifies the meanings for thecharacter areas; and a video-attribute-information output unit thatreads the video attribute information from the character areas to whichthe meanings are assigned, and outputs the video attribute information.

According to another aspect of the present invention, a video digestforming apparatus includes the video-attribute-information outputapparatus; an importance calculating unit that calculates a level ofimportance for each event included in the video attribute informationoutput from the video-attribute-information output apparatus; a videosegment selecting unit that selects a video segment of the video contentto be included in a digest video picture in accordance with the level ofimportance calculated by the importance calculating unit; adescription-use event selecting unit that selects a description-useevent from a list of events to use in preparation of a description thatis to be included in the digest video picture; a description preparingunit that prepares the description from the description-use eventselected by the description-use event selecting unit; and an integratingunit that combines the selected video segment and the description, andforms digest video information that includes both of the video segmentand the description.

According to still another aspect of the present invention, avideo-attribute-information output method includes detecting a change inscenes where there is little similarity between frames of a videocontent, and dividing the video content into a plurality of scenes;extracting an attribute information area in which attribute informationis displayed and which has no change between specific frames of thedivided adjacent scenes; extracting character areas in which videoattribute information in individual characters that is metadata of thevideo content is present, from the extracted attribute information area;assigning meanings to the extracted character areas by referring to ruleinformation that specifies the meanings for the character areas; andreading the video attribute information from the character areas towhich the meanings are assigned, and outputting the video attributeinformation.

A computer program product according to still another aspect of thepresent invention causes a computer to perform the method according tothe present invention.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a structure of a video digest formingapparatus according to a first embodiment of the present invention;

FIG. 2 is a block diagram for showing an overview of the structure ofthe video digest forming apparatus;

FIG. 3 is a block diagram for showing an overview of the structure of avideo-attribute-information generating unit;

FIG. 4 is a flowchart for briefly showing a process of automaticallygenerating video-attribute-information;

FIG. 5 is a schematic diagram for showing an example of a screen of abaseball game broadcast;

FIG. 6 is a diagram for explaining a typical scene;

FIG. 7 is a schematic diagram for explaining a method of extracting afeature from an input video picture;

FIG. 8 is a flowchart of a typical scene extracting process;

FIG. 9 is a flowchart of an attribute-information-area extractingprocess;

FIG. 10 is a schematic diagram for explaining a process of extractingoverlapping portions among frames;

FIG. 11 is a schematic diagram for explaining a process of extracting anattribute information area;

FIG. 12 is a flowchart of a process of extracting a character area;

FIG. 13 is a schematic diagram for explaining a process of extracting acharacter area;

FIG. 14 is a schematic diagram for explaining a process executed onoverlapping regions;

FIG. 15 is a schematic diagram of an example of rule information;

FIG. 16 is a schematic diagram of an example of baseball scoreinformation;

FIG. 17 is a block diagram for showing an overview of the structure of avideo-attribute-information generating unit according to a secondembodiment; and

FIG. 18 is a schematic diagram for explaining an example of displayhighlighting.

DETAILED DESCRIPTION OF THE INVENTION

A first embodiment of the present invention is explained with referenceto FIGS. 1 to 16. According to the present embodiment, a personalcomputer is adopted for the video digest forming apparatus.

FIG. 1 is a block diagram of a structure of a video digest formingapparatus 1 according to the first embodiment of the present invention.The video digest forming apparatus includes a central processing unit(CPU) 101 that executes information processing, a read only memory (ROM)102 that stores therein a BIOS and the like, a random access memory(RAM) 103 that stores therein various kinds of data in a rewritablemanner, a hard disk drive (HDD) 104 that functions as various databasesand also stores therein various programs, a medium driving device 105such as a DVD drive that retains information by use of a recordingmedium 110, delivers information to the outside, and acquiresinformation from outside, a communication controlling device 106 thatconducts information to other outside computers by way of a network 2 byestablishing communications, a displaying unit 107 such as a liquidcrystal display (LCD) that presents the progress and results of aprocess to the operator, an input unit 108 such as a keyboard and amouse by which the operator inputs a command and information to the CPU101, and the like. A bus controller 109 controls transmission andreception of data among these units to bring the video digest formingapparatus 1 into operation.

When the user turns on the video digest forming apparatus 1, the CPU 101starts a program called a loader in the ROM 102 so that a program calledan operating system (OS) that manages the computer hardware and softwareis read from the HDD 104 into the RAM 103 to start the OS. The OS runsprograms, reads information in,.and saves the information in accordancewith the user's operation. Major OSs include Windows (registeredtrademark). An operation program that runs on an OS is called anapplication program. The application program does not have to be onethat runs on a certain OS, but may cause the OS to execute some of theprocesses discussed later. Alternatively, the application program may becontained in a group of program files for certain application softwareor the OS.

The video digest forming apparatus 1 stores therein a video processingprogram as an application program in the HDD 104. From this aspect, theHDD 104 serves as a recording medium that stores therein the videoprocessing program.

In addition, the application program installed in the HDD 104 of thevideo digest forming apparatus 1 is usually stored in the recordingmedium 110 selected from media of various systems, for example, anoptical disk such as a DVD, a magnetoopitical disk, a magnetic disk suchas a flexible disk, and a semiconductor memory. The operation'n programstored in the recording medium 110 is installed in the HDD 104. For thisreason, a portable recording medium 110 such as an optical informationrecording medium such as a DVD and a magnetic medium such as a floppydisk may also serve as a recording medium for storing the applicationprogram. In addition, the application program may be downloaded from thenetwork 2 in the outside by way of the communication controlling device106 to be installed in the HDD 104.

When the video processing program is initiated to operate on the OS, theCPU 101 of the video digest forming apparatus 1 executes differentcalculation processes in accordance with this video processing programto centrally control the units of the video digest forming apparatus 1.Among the calculation processes performed by the CPU 101 of the videodigest forming apparatus 1, the processes characteristic to the presentembodiment are explained below.

FIG. 2 is a block diagram for showing an overview of the structure ofthe video digest forming apparatus 1. As shown in FIG. 2, by executingthe video processing program, the video digest forming apparatus 1realizes a video-attribute-information generating unit 15, which is avideo-attribute-information output device, an importance calculatingunit 16, a video segment selecting unit 17, an description-use eventselecting unit 18, an description preparing unit 19, and an integratingunit 20. The numeral 10 denotes original video information, 11 is videoattribute information, 12 is digest forming information, 13 isdescription preparing information, and 14 is digest video information.

The original video information 10 can be various video contents such asTV programs and video pictures shot by the user. According to theembodiment, the original video information 10 is assumed to be digitaldata. The digital data can be of any format (in general, a compresseddata format such as MPEG-1 and MPEG-2). The source video can also beanalog data. If this is the case, the data must be converted intodigital data outside the video digest forming apparatus 1 in advance, orthe video digest forming apparatus 1 must be provided with ananalog/digital converting function. The video content may be of one kindor more. Any video content can be played back from any position inresponse to an input of identification information, such as a title andan ID, of the video contents and also either playback start time orframe number.

The video attribute information 11 can be various kinds of attributeinformation (metadata) relating to a video content. The video attributeinformation 11 includes a list of events in which any incidents thatoccur in a video content are listed as events. As an event, informationon the name of a person or an object and the action thereof (in abaseball game, for example, “player X's home-run”) and a breakpoint intime (such as “play ball” and “end of game”) is described together withthe occurrence time of the event. The occurrence time may be describedas the beginning and end of a segment in a similar manner to the sceneinformation, or as the time of the moment at which the event occurs. Theattribute information may also include information that is notparticularly related to the time of the video contents such as a type ofsport, the name of the opponent team, date and time, place, names ofplayers, and the result of the game. A sport game is employed here as anexample, but the attribute information can be described suitably inaccordance with the contents. For instance, the attribute informationfor a drama or an informational program can be described in a similarmanner by incorporating characters of the drama or topics of theprogram.

The digest forming information 12 includes the user's preference informing a digest, the length of the digest video picture, and parametersof the digest algorithm. The user's preference indicates informationsuch as keywords for finding a portion on which the user places higherpriority to view. In a sport game, favorite teams and players may serveas the keywords. The keywords may be subjects that the user desires toview only, or subjects that the user does not desire to view in additionto subjects that the user desires to view. The information may bedirectly input, or may be stored in an internal or external memorydevice so that it saves the user the labor of inputting the sameconditions every time.

The description preparing information 13 indicates information includinga template for preparing a description from an event in the videoattribute information 11. The template is a character string obtained bya combination of fixed character strings and variable character strings,such as “at (time), chance for (team). (player)'s play!”, where thecharacter strings in parentheses as in “(time)” show variable strings.

The digest video information 14 indicates a digest video pictureprepared by the video digest forming apparatus 1. The digest videoinformation 14 includes part of the original video content and theprepared descriptions (such as captions and narrations) of visibleinformation using characters or the like and audible information usingvoices or the like. A content prepared separately from the originalvideo content (such as a title frame) may also be included. The digestvideo information 14 may be in a format in which the information can bereproduced as a video content independently from the original videocontent, or in a format in which the information is reproduced bydisplaying and reproducing a description by use of the characters andvoices while referring to the original video content. In the lattercase, a description language such as SMIL by which multimedia can bedisplayed in synchronization may be adopted. The voices may bereproduced by incorporating a voice synthesizing technique.

The components of the video digest forming apparatus 1 are explainedbelow.

The video-attribute-information generating unit 15 automaticallygenerates the video attribute information 11 based on the original videoinformation 10. The process of automatically generating the videoattribute information 11 at the video-attribute-information generatingunit 15 will be described later in detail.

The importance calculating unit 16 calculates the level of importancefor each event included in the video attribute information 11 from thevideo attribute information 11 and digest forming information 12 that isinput therein. The level of importance of an event is obtained bycalculating a evaluation score indicating how much of the characterstrings included in the event match the keywords included in the digestforming information 12. For instance, the level of importance can beobtained from the following equation (1), where the level of importanceis w, and the total number of keywords is N. In the equation (1), a_(k)denotes a parameter showing the weight of the kth keyword in the digestforming information 12.

w=Sa _(k) M(x)   (1)

Here, the range of the total sum S is k=1 to N. When the xth keyword hasa match, M(x)=1, while when the xth keyword has no match, M(x)=0.

In the structure where a keyword may indicate what the user does notwish to view, M(x)=1 when the xth keyword has a match and the xthkeyword is related to a desired picture. On the other hand, M(x)=−1 whenthe xth keyword has a match but the xth keyword is related to anundesired picture. When there is no match, M(x)=0.

The video segment selecting unit 17 selects a segment of the originalcontent that is to be included in the digest video picture, inaccordance with the calculated level of importance.

The description-use event selecting unit 18 selects an event forpreparing descriptions (such as captions and narrations) havingcharacters or voices to be incorporated in the digest video picture,from the list of events. In this selection, a predetermined event may beautomatically selected, or an event may be selected explicitly by theuser. For instance, events such as hits are selected for a baseball gamebroadcast. In principle, there are four different possible processes, aprocess of preparing a description only for the entire digest videopicture, a process of preparing descriptions only for a certain portionof the digest video picture, a process of preparing descriptions for theentire or part of the digest video picture and a specific portion thatis not included in the digest video picture, and a process of preparinga description only for a specific portion that is not included in thedigest video picture. These event selecting processes may be executedseparately from the video segment selecting unit 17, or executed by useof the data acquired from the video segment selecting unit 17.

The description preparing unit 19 prepares a description (incorporatingvisible information including characters or audible informationincluding voices) from the description-use event selected by thedescription-use event selecting unit 18 and also from the descriptionpreparing information 13. As discussed before, the description preparinginformation 13 includes templates for preparing descriptions. A templateis a character string determined by a combination of fixed characterstrings and variable character strings such as “at (time), chance for(team). (player)'s play!” where the character strings in parentheses asin “(time)” are variable. Character strings that are obtained directlyor indirectly from the video attribute information 11 and the digestforming information 12 can fill in the variable strings. Multipletemplates are prepared in advance in accordance with types of events andthe like.

The integrating unit 20 integrates the video segment information and thedescription using a caption of characters or voice narration into thedigest video information 14 having both information items. The timing ofdisplaying characters and initiating an utterance is controlled tocorrespond with the event occurrence time and video segments. When thetiming is to bring into correspondence with the occurrence time, anappropriate time interval may be set before the occurrence time to starta display or an utterance, or the timing of the display and utterancemay be adjusted so that a certain point of the utterance segment(midpoint, for example) falls onto the occurrence time of the event.When the timing is to bring into correspondence with the video segment,the display or utterance may be made by setting an appropriate timeinterval from the beginning or end point of the segment. A combinationof these techniques can be certainly employed, for example by bringingthe caption of characters into correspondence with the video segmentwhile bringing the voice narration into correspondence with theoccurrence time.

In the integration process, the caption or the narration may becontained in the digest video by preparing a caption in a manner to beoverlaid when displayed or preparing a voice narration by voicesynthesis. Alternatively, the caption and the narration may be preparedin a text form as the digest video information 14, and the overlaydisplay and voice synthesis may be executed on the reproducing device.

In the above example, the integrating unit 20 simply combines theselected video segment information and the generated descriptioninformation to output one piece of digest video information 14. However,the video segment and volume may be adjusted in this process.

According to the present embodiment, the video digest forming apparatus1 selects a video segment that the user is interested in from the videoinformation, in accordance with the keywords input by the user, theconditions such as time, and the user's preference, thereby forming adigest video picture to which a caption or voice narration is added. Inaddition, because the description is provided to make up for omittedimportant scenes, a digest video picture that is clearly understandablecan be prepared.

Hence, the video digest forming apparatus 1 according to the presentembodiment allows the user to obtain and view a digest that includesportions of the user's interest only, without playing back the wholeprogram. A portion that does not appear in the video picture may beincluded in the description incorporating captions and voice narrations,and the user can view such a video picture with the style of writing andtone of voice that suits the user's preference.

The process of automatically generating the video attribute information11 at the video-attribute-information generating unit 15 is explained indetail below. FIG. 3 is a block diagram for showing an overview of thestructure of the video-attribute-information generating unit 15, andFIG. 4 is a flowchart for briefly showing the process of automaticallygenerating the video attribute information.

As shown in FIG. 3, the video-attribute-information generating unit 15receives the original video information 10, and outputs the videoattribute information 11. The video-attribute-information generatingunit 15 includes a scene dividing unit 151 that serves as a means fordividing the original video information 10 into scenes, a typical sceneextracting unit 152 that serves as a means for extracting images(typical scenes) shot in the same composition, anattribute-information-area extracting unit 153 that serves as a meansfor extracting an attribute information area, anattribute-information-area presence judging unit 154 that serves as ameans for judging whether an attribute information area is present in atypical scene, a character-area extracting unit 155 that serves as ameans for extracting a character area containing the attributeinformation area, a character-area-meaning assigning unit 156 thatserves as a means for assigning meaning to the character area, and acharacter-area reading unit 157 that serves as a means for reading videoattribute information from the character area and outputting the videoattribute information.

The overview of the process of automatically generating the videoattribute information is now described. The steps of the process will bediscussed later in detail. As shown in FIG. 4, first, a feature iscalculated for each frame of the original video information 10, and theoriginal video information 10 is divided into scenes in accordance withsets of sequential frames whose features are similar (by the scenedividing unit 151 at step S1). In particular, the original videoinformation 10 is divided into scenes by detecting scene changes thatshow little similarity between any adjacent frames of the original videoinformation 10.

Next, typical scenes are extracted from the scenes (by the typical sceneextracting unit 152 at step S2). An attribute information area in whichthe attribute information is displayed is extracted from the screen ofeach of the typical scenes (by the attribute-information-area extractingunit 153 at step S3). Whether any attribute information is displayed ineach of the typical scenes extracted at step S2 is determined (by theattribute-information-area presence judging unit 154 at step S4).

Thereafter, a character area is extracted from the typical scenes inwhich the attribute information is displayed (by the character-areaextracting unit 155 at step S5), and meaning is assigned to theextracted character area by comparing the character-area with ruleinformation (by the character-area-meaning assigning unit 156 at stepS6).

Finally, the character area is read to generate the video attributeinformation (by the character-area reading unit 157 at step S7), and theprocess is terminated.

Now, the above used terms are explained with reference to the drawings.

FIG. 5 is a schematic diagram for showing an example of a screen of abaseball game broadcast. As shown in FIG. 5, the area denoted as thenumeral 201 shows “inning”, “score”, “count”, and “runners on base”.These score information items are referred to as attribute information,and the area 201 is referred to as an attribute information area. Inaddition, each area in the attribute information area where individualcharacters are displayed (for instance, a score region for the count,denoted as 202) is referred to as a character area.

FIG. 6 is a diagram for explaining images shot in the same composition,which serve as typical scenes. The numeral 301 denotes the originalvideo information 10. Time flows in a direction to the right on thedrawing sheet. The shaded areas (such as 302) show pitching scenes. Thepitching scene is shot from the back of a pitcher toward a batter, asillustrated in FIG. 5. In a baseball game broadcast, the screen isswitched to the picture as shown in FIG. 5 every time a ball is pitched,with the position and orientation of the camera staying almost the same.Therefore, a pitching scene repeatedly appears in a baseball gamebroadcast, as indicated as 303. Images that are shot in the samecomposition and repeatedly appear in a video content are defined astypical scenes.

Next, the process at each step is explained in detail.

The scene dividing process by the scene dividing unit 151 (at step S1)is explained first. The scene dividing unit 151 performs scene divisionby comparing the features of the video frames of the original videoinformation 10. FIG. 7 is a schematic diagram for explaining a method ofextracting a feature from an input video picture. The numeral 401denotes video frames of the original video information 10 placed inorder. The features can be extracted directly from the video frames.According to the present embodiment, however, the method of extractingfeatures while reducing the processing amount by taking samples in atemporal and spatial manner is dealt with.

In a temporal sampling, some frames 402 are picked and processed from agroup of video frames 401, as shown in FIG. 7. The frame may be pickedup in certain intervals, or only I-pictures may be picked up from MPEGframes. The frame 403 is one of the picked up frames. Then, the pickedup frame 403 is reduced to create a thumbnail image 404, which issubjected to the spatial sampling. The thumbnail image 404 may becreated by finding an average value of pixels to reduce the size, or bydecoding only DC components of the DCT coefficients of the MPEGI-picture. Then, the thumbnail image 404 is divided into blocks, and acolor histogram 405 is obtained for each block. This color histogram isdetermined as the feature of this frame.

Then, the distance between the features of the adjacent frames 403 isfigured out, and the scene division is performed between the frames 403where the distance of the features exceeds a certain standard value, orin other words, where there is little similarity therebetween. AEuclidean distance may be used for the distance of the features. TheEuclidean distance d between the frame i and the frame i+1 can beexpressed as the following equation (2), where in the histogram for theath block of the frame i, the frequency of the bth color is h_(i) (a,b).

$\begin{matrix}{d^{2} = {\sum\limits_{a}\; {\sum\limits_{b}\; ( {{h_{i}( {a,b} )} - {h_{i + 1}( {a,b} )}} )^{2}}}} & (2)\end{matrix}$

The process of extracting typical scenes by the typical scene extractingunit 152 (at step S2) is now explained. FIG. 8 is a flowchart of atypical scene extracting process. As shown in FIG. 8, first, scenes areclassified into groups in accordance with similarities based on thefeatures of individual scenes grouped by the scene dividing unit 151 (bythe classifying means at step S11). In other words, one group containsonly scenes that are similar to one another. The feature of the topframe of each scene may be used as the feature of a scene.

At step S12, whether “repetitive scenes” that exceed a standard valueare included in the scenes of Group i (i=1, 2, 3, . . . N (initial valueof i=1)) is determined. N denotes the total number of groups. The“repetitive scenes” are images that are shot in the same composition andrepeatedly appear in the time series of the video picture. Thedetermination may be made by checking whether the number of “repetitivescenes” in a group exceeds a predetermined threshold value, or whetherthe total hours of “repetitive scenes” in a group make up a certainpercentage or more of the length of the original video picture.

When it is determined that “repetitive scenes” that exceed a standardvalue are included, or in other words, that the standard value ismeasured up (yes at step S12), all the scenes classified as Group i areregarded as typical scene at step S13. The function of the determiningmeans is thereby achieved. Then, the system proceeds to step S14.

On the other hand, when “repetitive scenes” that exceed a standard valueare not included (no at step S12), the system proceeds directly to stepS14.

At step S14, whether all the groups have been subjected to the process,or in other words, whether i=N is determined. If there is any group yetto be subjected to the process (no at step S14), the value i is updated(at step S15), and the system returns to step S12. If all the groupshave been subjected to the process (yes at step S14), the selectedtypical scenes are reordered in time (at step S16), and the process isterminated.

The process of extracting an attribute information area by theattribute-information-area extracting unit 153 (at step S3) is explainednext. FIG. 9 is a flowchart of an attribute-information-area extractingprocess. As shown in FIG. 9, first, the frames before and after theboundary drawn by the scene dividing unit 151 for scene division areselected, and an overlapping portion of the frames is extracted (at stepS21). The selected frames do not have to be the ones immediately beforeor after the boundary, but frames positioned several frames or severalseconds away from the boundary may be selected. All the scene boundariesin the video picture may be subjected to the process, or only theboundaries between the typical scenes and the scenes immediately beforemay be subjected to the process. This is because the typical scenes(such as pitching scenes) indicate the beginning of a play, and thusattribute information (score information) is often displayed thereon.Therefore, the typical scenes are suitable for detection of an attributeinformation area.

FIG. 10 is a schematic diagram for explaining a process of extractingoverlapping portions among frames by use of typical scenes. As shown inFIG. 10, the numeral 501 denotes a frame near the beginning of one ofthe typical scenes, while 502 denotes a frame toward the end of thescene immediately before the typical scene. In other words, at step S21,an overlapping portion of the frame 501 near the beginning of one of thetypical scenes and the frame 502 near the end of the scene immediatelybefore are extracted as unchanged portions. In FIG. 10, the shadedoverlapping portion 503 is extracted as an unchanged portion.

Next, pixels in the extracted overlapping portion that fall below athreshold value are obtained by binarization (at step S22).

Then, the overlapping portions extracted from all the scenes on theboundaries are added up (at step S23), and pixels in the addedoverlapping portion (504 in FIG. 10) that fall below a threshold valueare obtained by binarization (at step S24).

Finally, a portion having such pixels in high distribution density isextracted as an attribute information area (at step S25). FIG. 11 is aschematic diagram for explaining a method of extracting an attributeinformation area. An area 602 is a candidate for the attributeinformation area in a screen 601, and the position of the area 602 isfound out. Histograms having distributions as indicated by 603 and 604are formed by projecting the screen 601 in the directions of the x-axisand y-axis, respectively. Segments 605 and 606 having frequencies higherthan a threshold value are found in the histograms 603 and 604, and thearea 602 thereby obtained is determined as the position of the attributeinformation area 201.

Next, the process of judging whether the attribute information area ispresent by the attribute-information-area presence judging unit 154 (atstep S4) is explained. Whether the attribute information (scoreinformation) is displayed on each of the typical scenes extracted atstep S2 is determined by comparing the overlapping portion 504 added upat step S3 with the overlapping portion (such as 503) of each typicalscene and the scene immediately before. In other words, the overlappingportion 504 is compared with the overlapping portion (such as 503) ofeach typical scene and the scene immediately before, and when a certainnumber or more of pixels match, the typical scene has the attributeinformation (score information) displayed thereon.

Next, the process of extracting a character area by the character-areaextracting unit 155 (at step S5) is explained. FIG. 12 is a flowchart ofthe process of extracting a character area, while FIG. 13 is a schematicdiagram for explaining the process of extracting a character area.First, differences between the attribute information areas 201 of anytwo adjacent typical scenes are extracted (at step S31), as indicated inFIG. 12. In FIG. 13, for instance, differences 703 between the attributeinformation areas 201 of a typical scene 701 and a typical scene 702 arecalculated.

Next, the differences are binarized in accordance with a threshold value(at step S32). In this manner, only a portion that has any changebetween the attribute information areas 201 of the two typical scenes isextracted. Because such a typical scene reappears every pitching in abaseball game, changes are extracted for each ball that is pitched.Differences are obtained for other pairs of adjacent typical scenes. InFIG. 13, for instance, differences 705 are obtained between theattribute information areas 201 of the typical scene 702 and the typicalscene 704.

Thereafter, portions of the attribute information areas 201 that exhibitsuch differences are grouped into regions (at step S33). Because thedifferences are binarized, the region division can be carried out by aregion growing method, with which adjacent pixels having the same valueare regarded as the same region. The region obtained in this manner isdetermined as a character area.

Finally, based on the character areas obtained for all the typicalscenes, a region list 706 of areas that have any change in the videopicture is prepared (at step S34). The list includes only one area forany set of overlapping areas.

FIG. 14 is a schematic diagram for explaining a process executed on theoverlapping regions. In this drawing, a strike count is taken as anexample for a process conducted on the overlapping regions. The dottedlines are included in the drawing only for the purpose of explanation,and are not present in reality. The numeral 801 shows an area indicatinga difference when the count changes from two strikes to no strike due toa change of batters. An area 802 indicates a difference when the countchanges from no strike to one strike. In the similar manner, an area 803indicates a difference when the count changes from one strike to twostrikes. The area 801 matches the combination of the areas 802 and 803and overlaps each of the areas 802 and 803. The area 801 is not includedin the area list, while the areas 802 and 803 that do not overlap eachother are included therein. The areas similar to the areas 802 and 803repeatedly appear in the video picture, but are included in the listonly for once.

The process of assigning meaning to a character area by thecharacter-area-meaning assigning unit 156 (at step S6) is explainednext. The character-area-meaning assigning unit 156 assigns meaning toeach area of the character area list extracted at step S5.

For a baseball game, for instance, meaning is assigned to the scoreregion (character area) showing the count (strikes, balls, and outs) andthe score. Rule information is used for the meaning assignment andcompared with the features of the score region (character area) toassign the matching content to the score region. The rule informationincludes information on groups of the score regions (character area),positional relationship with other score regions (character areas), andrough positioning within each score region (character area). A groupconsists of several score regions that form one meaning. The ruleinformation relates to the number of score regions (character areas) ina group, and the color and positional relationship thereof. Forinstance, the strike count consists of two score regions (characterareas), and thus the two regions should be dealt with as one group. Forthis reason, at the first step of the meaning assigning process, theregions are sorted into groups, and then at the next step, the groupsare compared relatively with one another.

FIG. 15 is a schematic diagram of an example of the rule information. Asshown in FIG. 15, the rule information includes types of rules 901,rules 902 for corresponding groups, and rules for positionalrelationship 903. For instance, in relation to the “strike” count, arule 902 is defined for a group that has “two regions of the same sizeand the same color that are horizontally adjacent to each other”.Therefore, regions are sorted into groups in accordance with theirpositions and colors. The difference between adjacent regions andregions in the vicinity resides in threshold values for the distancebetween the regions. The distance between adjacent regions is shorterthan the distance between regions in the vicinity. In relation to the“strike” count, a rule 903 is also defined, specifying the relationshipof relative positions with other components, “the same x coordinate(when vertically arranged) or the same y coordinate (when horizontallyarranged) as the left sides of the regions for the ball and out counts”.The rules for positional relationship 903 are adopted to performcomparison, and the regions that satisfy a rule 903 is selected fromamong the regions sorted into groups in accordance with the rules 902,thereby completing the meaning assignment.

In a baseball game broadcast, for example, the use of such abstractrules reduces the influence of differences in screen designing amongbroadcast stations, thereby enhancing general versatility. The ruleinformation does not have to be a single rule pattern, but there may beseveral patterns. For instance, a rule pattern in which the ball countis displayed in digits may be incorporated. Digits defined by the ruleare ones that can be read as digits by selecting and reading a typicalscene or more on which the attribute information area 201 is displayed.Score information presented in broadcasting of a sport game has a fixedform to a certain extent according to a kind of sport, with littledifferences among broadcast stations. Thus, several patters for eachkind of sport will suffice. If general versatility is not required, thepositions and sizes of score regions (character areas) may be specified,instead of abstract rule information, incorporating the arrangement ofthe score information for a specific broadcasting program as a template.

The process of generating the video attribute information by thecharacter-area reading unit 157 (at step S7) is explained next. Thecharacter-area reading unit 157 reads the video attribute information11, which is metadata of the video content, from each character area towhich meaning is assigned at step S6, and outputs the video attributeinformation 11. FIG. 16 is a schematic diagram of an example of baseballscore information. The score information of a baseball game in thisdrawing includes the inning, score, ball count, and runners on base.However, only part of such information (for example informationincluding only the ball count) may be generated. If this is the case,only relevant information should be subjected to the meaning assignmentin the process at step S6.

The process of reading the video attribute information 11 from the scoreregion (character area) differs depending on the video attributeinformation 11 being graphics or characters. For instance, graphics areadopted to show the ball count and runners on base, which are expressedby the number of circles. For the graphics, each score region is checkedto determine whether the score character information is in the displaystate or nondisplay state. In general, the display state involves highintensity and saturation, and thus the display state may be determinedby use of threshold values. Alternatively, the score regions (characterareas) of several typical scenes may be sorted into two classes (groups)so that regions with higher intensity and higher saturation can beselected. On the other hand, when the target to be read is characters, abaseball game broadcast employs only digits and characters for “top” and“bottom”. This means that the types of characters are limited. Hence,templates for these characters are prepared in advance so thatcharacters can be read out by comparing with the templates. Of course, acharacter recognizing technique adopting a general OCR may be employedso that general versatility is improved. Corrections are added to thescore character information (video attribute information 11) read fromthe score region (character area) in accordance with the rules of thesports so that there will be no inconsistency. For instance, if digitsfor a score become small when there is no change in innings or the like,it is likely that the digits have been erroneously recognized. If thereare any increased or unchanged digits that are read out with highreliability, these digits should be adopted. Furthermore, when anunusual recognition result is obtained from the process of readinggraphics, such as the second or third circle of the ball count being litwith the first one unlit, a correction is made by referring to thereading result obtained before or after.

According to the present embodiment, an attribute information area whichdisplays the attribute information therein and makes no change betweenframes of adjacent scenes that are defined by dividing the video contentis extracted. Then, character areas in which individual characters inthe attribute information area are present are extracted, and meaning isassigned thereto. The video attribute information, which is metadata onthe video content, is read and output from the meaning given characterareas. This saves manual operations of designating an image region inwhich the video attribute information is displayed, where the videoattribute information is metadata relating to the video content such asscores and ball counts in a baseball game broadcast. The video attributeinformation can be automatically recognized and obtained from the videocontent, and thus the metadata relating to the video content can beeasily generated.

In a sport game broadcast such as a baseball game, images that are shotin the same composition (typical scenes) repeatedly appear, such as apitching scene in which a pitcher is shot from the back in a directiontoward a batter. Such typical scenes are often repeated for every playof the sport, and each time of this scene, there is some change to thescore information such as the score and ball count. According to thepresent embodiment, typical scenes that are repeated units such aspitching in a baseball game are automatically detected from the targetvideo content, and the video attribute information is detected andrecognized in accordance with the changes on the screen between thetypical scenes. Metadata on scores and ball counts is therebyautomatically generated. In this manner, the metadata on the videocontent can be still more easily generated.

The present invention according to a second embodiment is explained withreference to FIGS. 17 and 18. The same components as the firstembodiment are given the same reference numerals, and the explanationthereof is omitted.

FIG. 17 is a block diagram for showing an overview of the structure of avideo-attribute-information generating unit 15 according to a secondembodiment. As shown in FIG. 17, the video-attribute-informationgenerating unit 15 according to the second embodiment includes an eventdetecting unit 160 in addition to the structure of thevideo-attribute-information generating unit 15 according to the firstembodiment, and event information 30 detected by the event detectingunit 160 is output to the description-use event selecting unit 18 (seeFIG. 2).

An event denotes anything that happens in a game. An event may be a hit,a home-run, or a change of batters. At such an event, a certain changeoccurs to the score information (video attribute information 11). Thus,the event detecting unit 160 compares the pattern of the change to thescore information (video attribute information 11) with event ruleinformation that is prepared in advance. When a certain pattern isobserved, the pattern is detected as the event information 30. The eventrule information may be “the strike and ball counts returned to 0 whenthere is a hit”, “the strike and ball counts returned to 0, while theout count incremented when the batter retires”, or “no runner on baseand the score incremented when there is a home-run”. Such event ruleinformation is prepared for each event that is to be detected, andcompared with the video attribute information 11 to output the eventinformation 30.

The score information (video attribute information 11) shown in FIG. 16is taken as an example. When the score information changes from 1001 to1002, the strike and ball counts are cleared, while the out count isunchanged. This shows that the result of the pitch at 1001 is a hit.When the score information changes from 1003 to 1004, the strike andball counts and on-base entry are all cleared, with the score increased.This shows the result is a home-run.

The integrating unit 20 thereby sequentially reproduces the typicalscenes including the event information 30 selected by thedescription-use event selecting unit 18. More specifically, theintegrating unit 20 receives the time of the typical scene obtained bythe typical scene extracting unit 152, and upon receipt of aninstruction for skipping, the integrating unit 20 skips the videopicture that is being reproduced to the next typical scene. Skipping tothe typical scene of the next event may be performed automatically atthe timing when the reproduction of the current typical scene iscompleted, or at the timing of receiving an instruction from outside.When the reproduction is resumed after the skipping, the descriptionpreparing unit 19 prepares a description based on the score information(video attribute information 11) corresponding to the typical scene. Asdiscussed above, the description is generated by inserting the scoreinformation into a description template (description preparinginformation 13). The template includes a fixed phrase without portionscorresponding to the score information (video attribute information 11),and the score information (video attribute information 11) should beinserted into blanks. The template may be more than one type, and may beswitched around in accordance with the types of the score information(video attribute information 11), the description phrases that areplaced before and after the phrase, the development of the game, theuser's preference, and the like. For instance, the score information1005 in FIG. 16 is inserted into a template “in the (top/bottom) of the(-th) inning, with the score (score)-(score), the count (ball)-(strike).Then, the phrase “in the bottom of the fifth inning, with the score 2-3,the count 1-1” is obtained. The description prepared in this manner isconverted to audible information such as voice by sound synthesis, andreproduced in synchronization with the video picture that is to bereproduced. With such a structure, the user can skip scenes to view hitscenes and final pitches to batters only so that the viewing hours canbe shortened. Furthermore, because the situation of the game is audiblyintroduced at each scene where the viewing is resumed, the user canunderstand the development of the game.

During the skipping operation, highlighting may be performed so thatchanges in the score regions (character areas) can be easily noticed. Asillustrated in FIG. 18, when a typical scene including score regions(character areas) 1401 changes to a typical scene including scoreregions (character areas) 1402 by a skipping operation, the integratingunit 20 highlights the changed score region (character area) by drawinga box around the changed portion as shown in 1403 so that the change canbe easily noticed. Because the difference between the attributeinformation areas 201 of the two adjacent typical scenes is alreadyobtained by the character-area extracting unit 155, the regioncorresponding to the difference is highlighted. The highlighting mannermay be suitably determined. The portion may be placed in a box as shownin 1403, changed in color, or caused to blink. With such a display, theviewer can quickly notice which portion is changed when the screen isswitched. In this structure, the highlighting is provided when a changeoccurs at skipping, but it may be provided during continuousreproduction of the video picture.

According to the present embodiment, event information, which isdetailed metadata, can be formed from the video attribute information,and the video picture can be viewed in a short period of time bywatching only scenes where such event information appears.

Additional advantages and modifications will readily occur to thoseskilled in the art. Therefore, the invention in its broader aspects isnot limited to the specific details and representative embodiments shownand described herein. Accordingly, various modifications may be madewithout departing from the spirit or scope of the general inventiveconcept as defined by the appended claims and their equivalents.

1. A video-attribute-information output apparatus comprising: a scenedividing unit that detects a change in scenes where there is littlesimilarity between frames of a video content, and divides the videocontent into a plurality of scenes; an attribute-information-areaextracting unit that extracts an attribute information area in whichattribute information is displayed and which has no change betweenspecific frames of adjacent scenes that are obtained by dividing by thescene dividing unit; a character-area extracting unit that extractscharacter areas in which video attribute information in individualcharacters that is metadata of the video content is present, from theattribute information area extracted by the attribute-information-areaextracting unit; a character-area-meaning assigning unit that assignsmeanings to the character areas extracted by the character-areaextracting unit, by referring to rule information that specifies themeanings for the character areas; and a video-attribute-informationoutput unit that reads the video attribute information from thecharacter areas to which the meanings are assigned, and outputs thevideo attribute information.
 2. The apparatus according to claim 1,further comprising: a typical scene extracting unit that extractstypical scenes that are images shot in a same composition and repeatedlyappear in the video content, from the adjacent scenes obtained bydividing by the scene dividing unit; and a region judging unit thatjudges whether each of the typical scenes that are extracted by thetypical scene extracting unit has the attribute information areaextracted by the attribute-information-area extracting unit, wherein thecharacter-area extracting unit extracts character areas in which theindividual characters are present in the attribute information areaextracted by the attribute-information-area extracting unit and judgedas being in a typical scene by the region judging unit.
 3. The apparatusaccording to claim 2, wherein the attribute-information-area extractingunit extracts attribute information areas from a boundary between thetypical scene and a scene immediately before the typical scene.
 4. Theapparatus according to claim 2, wherein the typical scene extractingunit includes: a classifying unit that classifies the scenes obtained bythe dividing unit into groups in accordance with similarity based onfeatures of the scenes; and a determining unit that determines all thescenes classified into the group as the typical scenes, when the imagesshot in the same composition appear in the scenes of a group at afrequency equal to or higher than a criterion level.
 5. The apparatusaccording to claim 4, wherein the criterion level used by thedetermining unit is whether a total number of images of the group thatare shot in the same composition exceeds a predetermined thresholdvalue.
 6. The apparatus according to claim 4, wherein the criterionlevel used by the determining unit is whether a total hours of images ofthe group that are shot in the same composition accounts for apredetermined percentage or higher of a time length of the videocontent.
 7. The apparatus according to claim 1, wherein the ruleinformation includes contents of meanings, first rules that are broughtinto correspondence with the contents of the meanings and are to befollowed when grouping the character areas in accordance with at leastone of a relationship of relative positions and a relationship ofcolors, and second rules that specify a positional relationship of thecharacter areas; and the character-area-meaning assigning unit selects acontent of meaning for a group of character areas that satisfy a secondrule from among the character areas grouped in accordance with the firstrules, and assigns meaning to the group of the character areas.
 8. Theapparatus according to claim 1, wherein the video-attribute-informationoutput unit corrects a recognition result when there is anyinconsistency in the video attribute information read from the characterareas to which the meaning is assigned.
 9. The apparatus according toclaim 1, further comprising an event detecting unit that detects thevideo attribute information output from the video-attribute-informationoutput unit, as event information, when the video attribute informationis compared with event rule information that includes a relationshipbetween changing patterns of the video attribute information and theevents, and a specific pattern is found therein.
 10. A video digestforming apparatus comprising: a video-attribute-information outputapparatus according to claim 1; an importance calculating unit thatcalculates a level of importance for each event included in the videoattribute information output from the video-attribute-information outputapparatus; a video segment selecting unit that selects a video segmentof the video content to be included in a digest video picture inaccordance with the level of importance calculated by the importancecalculating unit; a description-use event selecting unit that selects adescription-use event from a list of events to use in preparation of adescription that is to be included in the digest video picture; adescription preparing unit that prepares the description from thedescription-use event selected by the description-use event selectingunit; and an integrating unit that combines the selected video segmentand the description, and forms digest video information that includesboth of the video segment and the description.
 11. The apparatusaccording to claim 10, wherein the integrating unit sequentially outputsthe digest video information including the description-use eventselected by the description-use event selecting unit, and reproduces thedigest video information in a skipping manner.
 12. The apparatusaccording to claim 10, wherein the integrating unit highlights characterareas that exhibit changes during reproduction in a skipping manner. 13.A computer program product having a computer readable medium includingprogrammed instructions for outputting video attribute information,wherein the instructions, when executed by a computer, cause thecomputer to perform: detecting a change in scenes where there is littlesimilarity between frames of a video content, and dividing the videocontent into a plurality of scenes; extracting an attribute informationarea in which attribute information is displayed and which has no changebetween specific frames of the divided adjacent scenes; extractingcharacter areas in which video attribute information in individualcharacters that is metadata of the video content is present, from theextracted attribute information area; assigning meanings to theextracted character areas by referring to rule information thatspecifies the meanings for the character areas; and reading the videoattribute information from the character areas to which the meanings areassigned, and outputting the video attribute information.
 14. Avideo-attribute-information output method comprising: detecting a changein scenes where there is little similarity between frames of a videocontent, and dividing the video content into a plurality of scenes;extracting an attribute information area in which attribute informationis displayed and which has no change between specific frames of thedivided adjacent scenes; extracting character areas in which videoattribute information in individual characters that is metadata of thevideo content is present, from the extracted attribute information area;assigning meanings to the extracted character areas by referring to ruleinformation that specifies the meanings for the character areas; andreading the video attribute information from the character areas towhich the meanings are assigned, and outputting the video attributeinformation.